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1.  Introduction. 


We  imagine  a  simple  processor  placed  at  each  site  s  of 

the  graph.  The  state  of  the  machine  evolves  by  discrete  chages 

and  it  is  therefore  convenient  to  discretize  time,  say  t  =  1,2, 

3,'**.  At  time  t,  the  state  of  the  processor  at  site  s  is  a 

random  variable  X  (t)  with  values  in  A  =  A  =  {0,1,2, *•* ,L-1). 

s  s 

The  total  configuration  is  X(t)  =  (X_  (t) ,X_  (t),***,X  (t) ) , 

S1  s2  SN 

which  evolves  due  to  state  changes  of  the  individual  processors. 

The  starting  configuration,  X(0),  is  arbitrary.  At  each  epoch, 

only  one  site  undergoes  a  change,  so  that  X(t-l)  and  X(t)  can 

differ  in  at  most  one  coordinate.  Let  ni»n2'***  be  the  sequence 

in  which  the  sites  are  visited  for  replacement:  thus  nfc  €  S  and 

X  (t)  *  X  (t-1),  i  ^  n, .  Each  processor  is  programmed  to 
si  si  t 

follow  the  same  algorithm:  at  time  t,  a  sample  is  drawn  from 

the  local  characteristics  of  not  necessarily  Gibbs  measure 

for  s  =  nfc  and  oj  =  X(t-l).  In  other  words,  we  choose  a  state 

x  ^  A  from  the  conditional  distribution  of  X  given  the 
nt  nt 

observed  states  of  the  sites  Xr(t-1),  r  4  nfc.  The  new 

configuration  X(t)  has  X  (t)  =  x  and  X  (t)  =  X  (t-1),  s  4  n.  . 

nfc  s  s  t 

Given  an  initial  configuration,  X(0),  we  obtain  a  sequence 

X (1) ,  X(2),***  of  configurations  which  converge  to  a  limit 

distribution  ir^  of  nt.  The  limits  obtained  do  not  depend  on 

X(0).  When  TTj_  is  Gibbs  measure,  we  can  apply  our  theorems  to 

the  annealing  restorations  of  degraded  images. 

The  author  wishes  to  express  his  gratitude  to  Prof.  H. 

Umegaki  and  Prof.  C.  R.  Baker  for  interest  in  and  valuable 

comments  to  the  present  work. 


2.  Preliminaries. 


Let  S  =  • • • 'SN)  be  a  set  of  sites  and  let  G  = 

{G  ,  s  e  S}  be  a  neighborhood  system  for  S,  meaning  any 
s 

collection  of  subsets  of  S  for  which  1)  s  4  G  and  2)  s  6  G 

T  s  r 

<=£  r  €  G  .  Obviously,  G_  is  the  set  of  neighbors  of  s  and 
s  s 

the  pair  {S,G}  is  a  graph  in  the  usual  way.  A  subset  CCS 

is  a  clique  if  every  pair  of  distinct  sites  in  C  are  neighbors; 

C  denotes  the  set  of  cliques.  Let  X  =  {X  ,  s  €  S}  denote  any 

s 

family  of  random  variables  indexed  by  S.  For  simplicity,  we 
can  assume  a  common  state  space,  say  A  =  {0,1,2, ••• ,L-1) , 
so  that  Xg  C  A  for  all  s.  Let  A  be  the  set  of  all  possible 
configurations : 

n  =  (a)  =  (x  ,***,x  )  :  x  e  A,  1  <  i  <  N}. 

S1  SN  si  “  ~ 

As  usual,  the  event  {X  =  x  , •♦*,X  =  x  }  is  abbreviated 

S1  S1  SN  SN 

{X  «  a)}. 

X  is  a  Markov  random  field  (we  abbreviate  MRF)  with  respect  to 
G  if 

P(X=w)  >  0  for  all  w  C  R;  (2.1) 

p  (xs=xs  I  Xr=xr '  r+s)  =  P(Vxs|Xr=xr'  r  e  6s)  (2*2) 

for  every  s  €  S  and  (x  ,***,x  )  €  H.  Technically,  what  is 

S1  SN 

meant  here  is  that  the  pair  {X,P>  satisfies  (2.1)  and  (2.2) 
relative  to  some  probability  measure  on  fl.  The  collection  of 
functions  on  the  left-hand  side  of  (2.2)  is  called  the  local 
characteristics  of  the  MRF  and  it  turns  out  that  the  joint 
probability  distribution  P(X«u>)  of  any  process  satisfying 
(2.1)  is  uniquely  determined  by  these  conditional  probabilities. 


A  Gibbs  distribution  with  respect  to  {S,G}  is  a  probability 
measure  it  on  ft  with  the  following  representation: 

TT  ((d)  =  ~  exp(-U(w)/T) 

where  Z  and  T  are  constants  and  U,  called  the  energy  function, 
is  of  the  form 

U  (to )  =  l  V_(u>). 

C  £C 

Each  Vc  is  a  function  on  ft  with  the  property  that  Vc(w)  depends 

only  on  those  coordinates  x  of  w  for  which  s  €.  C.  Such  a 

s 

family  {vc,  C  &  C}  is  called  a  potential.  Z  is  the  normalizing 
constant: 

Z  =  \  exp (-U(w)/T) 

u> 

and  is  called  the  partition  function.  T  stands  for  "temperature" 
for  our  purposes,  T  controls  the  degree  of  "peaking”  in  the 
"density"  tt.  The  following  proposition  gives  the  equivalence 
between  MRF  and  Gibbs  distribution.  For  a  proof  see  [1] ,  [6] . 

PROPOSITION  1.  Let  G  be  a  neighborhood  system.  Then 
X  is  an  MRF  with  respect  to  G  if  and  only  if  tt  (oj)  =  P(X=w) 
is  a  Gibbs  distribution  with  respect  to  G. 

We  define  a  sample  process  {X(t);  t  =  0,1, 2,***}  in  the 
following:  Let  n1,n2,***  be  the  sequence  in  which  the  sites 
are  visited  for  updating.  The  initial  configuration  is  X(0). 

The  evolution  X(t-l)  -*■  X(t)  of  the  system  is  defined  by 

(1)  X(t-l)  and  X(t)  can  differ  in  at  most  one  coordinate  nfc 

(2)  (X(t);  t  =  0,1,2,* *•}  is  generated  by 


P(Xs(t)-xs,  s  e  S)  =  V’SiJV  s+nt)p(Xs(t-l)=xs,  s+nt) 

where  tt^  is  a  probability  measure  on  Q  such  that  0  <  Trt(w)  <  1 
for  every  u>  £  ft. 


Let  II  u  -  v!1  denote  the  L^-distance  between  two  distributions 


on  ft: 


y  —  v ||  =  1 1  y  (oj)  -  v  (w).  | . 


Obviously  yn~*-  y  (n  +»)  in  distribution  (i.e. ,  yn(u)) 
for  every  w)  if  and  only  if  [ |  y  —  y  ||  -►  0  (n  ■*  +») . 

We  assume  that  ir  has  a  limit  irffl  as  t  +  +°°. 


y  (w) 


3.  Convergence  Theorem. 


In  this  section  we  state  the  main  theorem  and  give  the 
proof. 

THEOREM  1.  Assume  that  there  exists  an  integer  t  ,>  N 
such  that  for  every  t  =  0,1,2,* ••  we  have 

S  C{nttl,nt+2,---,nt+T>. 

Suppose  that 

6(t)  =  inf  {it  (x  |  x  ,  j+i);  1  <  i  <  N,  (x  , • •  •  ,x  )  £  ft} 

t  si  Sj  -  sx  sN 


for  some  constant  C  >  0. 

Furthermore,  suppose  that  tt^  (go)  is  monotone  for  all  t  ^  t' 
for  some  integer  t'  and  for  all  <o  €  ft.  Then  for  any  starting 
configuration  n  €  ft  and  for  every  to  £  ft, 

lim  P(X(t)=u|X(0)=n)  *  it  (to)  . 
t  -*■  +°° 

We  need  the  following  two  lemmas  to  prove  the  above  theorem. 


LEMMA  1.  For  every  tn  =  0,1,2,***, 


lim  sup  |P(X(t)=(o|x(tn)=n' )  -  P(X(t)=io|x(t,J=Tr)  |  =  0 

t  +°°  w,n '  ,n" 


PROOF.  Fix  tn  =  0,1,2,* **  and  define  T,  =  tn  +  kx ,  k  =  0 


1,2, 


By  hypothesis  S  C.  {n 


t+1' 


,nfc+T)  for  all  t. 


Now  fix  k  for  the  moment  and  define  the  m^  as  follows: 


iik  =  sup { t ;  t  <  T^,  nt=si),  1  <  i  <  N. 


We  assume  that  m.  >  nu  >  • • •  > 


V 


Then 


P(X(Tjc)=aj|X(Tk_1)=aj'  ) 


=  P  (X 


=  P  (X 


N 


S  (Tk)=Xfi  <T.)=X«  |X(TV_,  )=(*>') 

S1  K  S1  SN  K  SN  K  1 

sx (ml)-xs1 ' • • • ' xsN (mN» 'xsN I x (Tk-l >  ■ ■ ’  1 


=  n  P (X  (m . ) =x 
j=l  sj  3  sj 


N 


Xs  (mi+l)=Xs  '*"'XS  (mN)=Xs  'X(Tk-l)=a,,) 
Sj+1  3+1  Sj+1  SN  W  SN  k  1 


>  n  6 (m. )  >  CN(tn  +  kT)-1. 

j=l  3  ~  0 


We  obtain 


inf  P(X(Tk)«w|X(T.  ,)=(!)•  )  >  CN(t0  +  kT)_1  (3.1 

co,  oo '  ** 

for  every  tg  =  0,1,2,***  and  k  =  1,2,***,  bearing  in  mind  that 
depends  on  tg.  For  each  t  >  0,  we  define  K(t)  =  sup{k; 

<  t}  so  that  K  (t )  -*■  +»  as  t  +  +°°. 

For  fixed  t  >  T^, 


sup  |p(x(t)=w|x(0)=n' )  -  P(x(t)=u>|x(0)=n")  | 
u,n' #n" 

=  sup  {sup  P(x(t)=u>|x(0)=n)  -  inf  p (x (t)  — uj | x (0 )  =n) } 
a)  n  n 

=  sup  {sup  l  P  (X  ( t )  =u  | X  (T.  )  =co 1 )  P  (X  (T,  )  =0) 1  | X  (0)  =  ji) 
u  n  w'  1 

-  inf  l  P(X(t)=oj|X(T1)=u),)P(X(T1)=w'  |  X  (0)  =n)  > 
n  u)  •  1  1 

=  sup  ft  (t  ,co)  . 

GJ 

Certainly,  for  each  co  6  ft, 


sup  l  P(X(t)=a)|X(T1)=w,)P(X(T1)=u)' |X(0)=n) 

n  co ' 

<  sup  l  P(X(t)-w|x(T.)-u),)y(w') 


7 


m 


where  the  supremum  is  over  all  probability  measure  y  on  ft  which 

N  -1 

by  (3.1)  »  are  subject  to  y(to')  £  C  (tg  +  k-r)  for  every  to'  £  ft. 
Suppose  to'  -*•  P(X(t)=to  jX(T^)=to' )  is  maximized  at  to'  =  to*  (which 
depends  on  to) .  Then  the  last  supremum  is  attained  by 

y  (to*)  =1-  (LN-l)CN(t0  +  kx)_1 

y(to')  =  CN(tQ  +  kT)  ^ ,  to'  4  to*. 

The  value  so  obtained  is 

(1-  (LN-l)CN(tQ  +  kT)“1)P(X(t)=to|x(T1)=to*) 


N  — 1 

+  Cw(t0  +  kT) 


to  '  Tto* 


P  (X  ( t )  =to  X(T1)=to')  . 


Similarly, 


inf  l  P  (X  ( t )  =to  |  X  (T. )  =to 1 )  P  (X  (T-,  )  =to 1  I X  (0 )  =ri ) 
n  to* 

>  inf  l  P  (X  (t )  =to  |X(T1)=(o,)y((o,) 
y  to'  1 

where  the  infimum  is  over  all  probability  measure  y  on  ft  which, 
by  (3.1),  are  subject  to  y(to')  ^  CN(tQ  +  kT)_1  for  every  to'  6  ft. 
Suppose  to'  ■+  P(X(t)=tojx(T^)=to' )  is  minimized  at  to'  =  to*  (which 
depends  on  to)  .  Then  the  last  infimum  is  attained  by 

y  (to*)  =1-  (LN-l)CN(t0  + kT)"1 

y(to')  =  CN(t0  +  ki)-1,  to'  =f  to*. 

The  value  so  obtained  is 

(1  -  (LN-l)CN(tQ  +  kT)_1)P(X(t)=to|x(T1)=to*) 


N  -1 

+  C  (tQ  +  kT)  ± 


P(X(t)=to|X(T1)=co') 


It  follows  that 


ft  (t  ,to)  < 


(1-  (LN-l)CN(tQ  +  ki)"1)  {P(X(t)=to|X(T1)=to*)  -  P(X(t)=tojX(T1)=to*)  } 


8 


LEMMA  2.  lim  sup  ||P(t,*  tn,TT  )  -  ir  |  =  0. 

y  oo  co 


tQ  -*■  +°°  t  ^  tQ 


PROOF.  The  probability  measures  P(t,*|t  tt  )  figure 

U  00 

prominently  in  the  proof,  and  for  notational  ease  we  prefer  to 

write  P.  (•)  such  that  for  any  t>t.>  0  we  have 
r0,t  -  u- 

f(w)  =  l  p(x(t)=cojx(tn)=n)tT  (n). 
o'  n 


To  begin  with,  we  claim  that  for  any  t>t0>0, 

Hpt0,t  - 

Assume  for  convenience  that  nt  =  s. .  Then 


(3.3 


I! pt  t  "  "tl 
r0'  t 


•••X  )|lt,X.1'V  s+sl,Pt0,t-l(Vxs'  s*sl> 

sl  SN 

-  7Tt  (xs,  S£S)  | 


l  {I  M*  lXc'  ®4si> 

•  *  -  X„  )  x_  6  A  c  sl  s  1 


SN  S1 


lpt0,t-l(xs=xs'  s+sl)  “W  s+sl)  I  > 

l  ipt  t-l(Xs=Xc'  S+S,  )  -  IT .  (x  ,  S+S,  )  I 

(x„  -.-X  )  1  s  s  1  t  s  1 


s2  SN 


l  {  pt  ,t-l(xs=xs'  s  e  S)  -  TTt  (xsf  S6S)}| 
x  0 


(x  ...xe  )  X  ’-O'1'  A  r  5 

2  SN  S1 

=  E  IPt  t-l(X«?=X«?'  S6S)  -  TT  (Xe,  S£S)| 

(x  ...xo  )  t0't  1  s  s  1  s 

sl  SN 

llpt0,t-l  "  71 1  11- 


Fix  t  >  tQ  £  0  , 


P.  .  -  TT 

1  to,t  “ 


=  1|Pt0,t"Wtll  +  ll^t  "  ^ool! 


m 

-r  > '  V  V.v.v 

M  i  * 

MW'v' 

M 

WWWWJHWS'Ji'J? 

Hiift.  ^A'vO.V.V.  k*- 

10 


-  II  pt0,t-l "  II +  II  *t  “  II  by  (3-3) 

11  pt0,t-l  ■  wt-l 11  + 11  "t-1  “  *t  II +  II  *t  ■ "»  II 

-  II  Pt0,t-2  ■  ^t-lll  +  II  "t-1  ’  "t  II  +  II  "t  ■  "»  II 

i  II  PtQ/t-l  •  VJI  +  H  "t-2  "  \-l  II  +  H  "t-1  “  "t  II  +  II  71 1  ' 


t-1 


Proceeding  in  this  way, 

ii  pVt  - H  i  H  pt0,t0  -  "tji  +  Jt  ii  "k  -  "k+i  ii +  ii  "t  *  *. 


Since  P.  .  =  tt  and  ||tt.  -  it  ||  -*■  0  as  t  -*■  +°°,  we  have 

t0,t0  ii  t  « " 


lim  sup  ||  P  -  tt  || 
trt  •*>  »  t  >  t„ 


’0  w-  "0 

t-1 

<  lim  sup  ||  TT  -  IT  ||+  lim  sup  l  II  it  -  tt 

.  V  i.  ^*0  J_  I  v  1  1 _ i.  1  ^ 


t0^W  t>tQ 


t0^“  t-to  k=t0 


+  lim  sup  ||  TT  -  Tim  || 

X.  V  M  1.  v  X 


k+1 


t  -  +  »  t> t ^ 
0  “0 


lim 

t-1 

sup  l 

to-'“ 

t  >  tg  k=t 

00 

lim 

I  II  *k- 

1- _ X.  ^ 

'k  k+1 


=  0. 


Because  by  assumption,  £  |j  ^k“^jc+1||  <  «•  This  completes  the 

k=t0 


proof  of  Lemma  2. 


PROOF  of  THEOREM  1.  For  any  n  £  fl, 


Urn  ||  P  (X  (t )  *  •  |  X  (0 )  *=r»)  -  tt  || 

t  -*■  OO 


lim  Hm  ||  l  P(t,*  |t0,^,)P(t0,n,  1 0 , n)  -  TTe 

tn  t  n '  u  u 

t  >  tn 
“  0 


Q.E.D. 
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<  Tim  lim  || l  p(t,*  |t0,n' )P(t0,n*  |o,n)  -  P(t,*  I tQ , tt^)  || 

t  _  -*■<*>  t  "*• 00  X] ' 
t>tQ 

+  Tim  ||P(t,*  (t.,^)  -  ttJ|. 

f  4°0  t  +  00 
0 

tito 

The  last  term  is  zero  by  Lemma  2.  Furthermore,  since  P(tQ,*|0,n) 

and  it  has  total  mass  1,  we  have 
00 

||  IfP(t,*  |t0,n,)P(t0,n'  |o,n)  -  p(t,«  It^nj  || 

=  l  I  I  P(t,o)|t0,n' )P(t0,n' |o,n)  -  Ptt^ltQ,^^)  | 

w  n ' 

=  l  sup  |  l  {p(t,wftft,n')  -  P(t,a)|tn,n") } 

w  n"  n ' 

{p(t0,n'  |o,n)  -  ■n0O(n')  >  | 

<  l  sup  l  j p (t , a) |  tQ , n 1 )  -  p  (t ,u) 1 1_ , n" )  | 
w  n"  n'  1 

|p(t0,n'  |o,n)  -  *„(*!')  | 

<1  sup  |p(t,0)|tn,n’)  -P(tru)|t0,n")  | 
w  n' ,n" 

l  |p(t0,n'  |o,n)  -  tt  (n ' )  I 
n' 

<2  \  sup  |p(t,w|tn,n' )  -  P(t,a)|tn,n")  | . 
w  n‘  /Ti" 

Finally  then 


Tim  ||  P<X(t)«- |X(0)-n)  -  irj| 

t  °° 


<2  l  lim  lim  sup  |P  (t  ,u> |  tQ ,n ' )  -  P  (t  ,w  |  tQ, n" )  | 
a)  tn  »  t  -*  °°  n’  ,n" 

^  ^  V  i. 


=  0  by  Lemma  1. 


This  completes  the  proof  of  Theorem  1. 


Q.E.D. 


We  obtain  the  following  corollaries. 

COROLLARY  1.  Suppose  that  the  sampler  process  (X(t);  t  = 
0,1,2,***}  is  generated  by 

P(Xg(t)=xs,  s  €  S)  =  Ti  (xn  |xg,  s4nt)P(Xg  (t“l)=xg,  s4nt) 

where  it  is  a  probability  measure  on  ft  such  that  0  <  ir  (w)  <  1 
for  every  w  G  ft.  Assume  that  for  every  s  €  S,  the  sequence 
(nt,  t^l}  contains  s  infinitely  often.  Then  for  any  starting 
configuration  n  £  ft  and  for  every  u  6  ft, 

lim  P(X(t)=u)|x(0)=ri)  *  tt  (o>)  . 

t  -*■  °° 

Concerning  ergodicity,  we  use  the  sampler  process  and 
compute  a  time  average  of  the  function  Y. 

COROLLARY  2.  Suppose  that  the  sampler  process  {X(t);  t  = 

0 , 1 , 2 ,  •  •  • }  is  generated  by 

P(Xg(t)=xs,  s€S)  =  Tr(xn  |xg,  s4nt)P(Xg(t~l)=xs,  s=j=nt) 

where  n  is  a  probability  measure  on  ft  such  that  0  <  tt  (oj)  <1 
for  every  w  £  ft.  Assume  that  there  exists  an  integer.!  .>  N  such 
that  for  every  t  =  0,1,2,* **  we  have 

s  C  tnt+i ,nt+2 '  “  * ' nt+T *  * 

Then  for  every  function  Y  on  ft  and  for  every  starting 
configuration  r)  G  ft, 

,  n 

lim  i  l  Y  (X  ( t ) )  =  l  Y(u>)it(u>) 
n  «  t=l  u 

holds  with  probability  one. 


4.  Applications  to  Image  Processing. 


Let  S  =  Zm  *  {(i,j):  l<:i,j<m}  denote  the  mxm  integer 

lattice;  then  F={F.  . (i,j)  6  S,  denotes  the  gray  levels 

1 f  D 

of  the  original,  digitized  image.  Lowercase  letters  will  denote 

the  values  assumed  by  these  random  variables.  It  is  natural  to 

expect  that  the  image  value  at  a  pixel  does  not  depend  on  the 

image  data  outside  its  neighborhood,  when  the  image  data  on  its 

neighborhood  is  given.  Specially,  we  model  F  as  an  MRF,  or, 

what  is  the  same  (see  Proposition  1) ,  we  assume  that  the 

probability  law  of  F  is  a  Gibbs  distribution  with  respect  to 

{S,G}  with  corresponding  energy  function  U  and  potentials  {Vc}. 

Let  H  denote  the  blurring  matrix  corresponding  to  a  shift- 

invariant  point-spread  function.  The  formulation  of  F  gives 

rise  to  a  blurred  image  H(F)  which  is  recorded  by  a  sensor. 

The  latter  often  involves  a  nonlinear  transformation  of  H (F) , 

denote  here  by  <J> ,  in  addition  to  random  sensor  noise  N  =  (n- 

xi  D 

which  we  assume  to  consist  of  independent,  and  for  definiteness, 
Gaussian  variables  with  mean  y  and  standard  deviation  a. 

We  also  assume  that  F  and  N  are  independent  as  stochastic 
processes.  The  degraded  image  is  then  a  function  of  4>  (H (F) ) 
and  N,  say  ^  ( <#>  (H  (F) )  ,N) ,  for  example,  addition  or  multiplication. 
To  compute  the  posterior  distribution,  we  only  need  to  assume 
that  b  <J>(a,b)  is  invertible  for  each  a.  For  notational  ease, 
we  will  write 

G  =  <MH(P))  ©  N 

At  the  pixel  level,  for  each  (i,j)  €  S, 


Since  the 


G±  ,  -  <M  I  H (i-k, j- A) F.  .)  0  n 
(k,  £,)  *'*’  ^ 

operation  0  is  assumed  invertible. 


i»  j* 

we  can  write 


N  =  $  (G,  <J>  (H  (F)  ) )  «  {*s,  S  €  S} 
to  indicate  this  inverse. 

Let  H  ,  s  €.  S,  denote  the  pixels  which  affect  the  blurred 
s 

image  H  (F)  at  s.  Observe  that  4>  ,  s  €.  S,  depends  only  on  g 
^  s  s 

and  {f.,  t  €  H  }.  By  the  shift-invariance  of  H,  H  *  s  +  H 

t  s  r+s  jt  i 

where  HrG  S,  s  +  r  €  S,  and  s  +  Hr  is  understood  to  be  intersected 

with  S,  if  necessary.  In  addition,  we  will  assume  that  (H  } 

s 

is  symmetric  in  that  r  +  s  6.  H  $=$  -r  +  s  6H  .  Then  the 

8  S 

collection  {H  \{s};  s  €  S}  is  a  neighborhood  system  over  S. 
s 

2 

Let  H  denote  the  second-order  system,  i.e., 

H2  =  U  H  ,  s  €.  S. 
r  €  Hg  r 

2 

Then  it  is  not  hard  to  see  that  {H  \{s},  sGS}  is  also  a 

s 

neighborhood  system.  Finally,  set  Gp  «  {Gp,  s  S }  where 

s 

Gp  =  G  H2 \ { s } .  Let  y  6FN(N  =  m2)  have  all  components  =  y 

s  s  s 

N  2  N 

and  let  ||*||  denote  the  usual  norm  in  F  :  |(x||  *  [  x.. 

i=l  1 

PROPOSITION  2.  For  each  g  fixed,  P(F=f|G=g)  is  a  Gibbs 
distribution  over  {S,GP}  with  energy  function 

Up(f)  =  U(f)  +  ||  y  -  $(g,<MH(f  )))||2/2o2. 

For  a  proof  see  [3].  The  posterior  distribution  P(F*f|g) 
is  a  powerful  tool  for  image  analysis;  in  principle,  we  can 
construct  the  optimal  estimator  for  the  original  image,  examine 
images  samples  from  P(F**f|g),  estimate  parameters,  design  near- 
optimal  statistical  tests  for  the  presence  or  absence  of  special 
objects,  and  so  on.  Specially,  our  work  here  is  to  find  the 
value (s)  of  f  which  maximize  the  posterior  distribution  for  a 
fixed  g,  i.e.,  minimize 


Up(f)  *  U(f)  +  ||y- $(g,4>(H(f)))||2/2<?2,  £€« 
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where  $  is  defined  by  $(H(f))©4>  =  g. 

In  other  words,  we  find  the  value (s)  of  f  with  lowest  energy. 

So  far,  we  have  not  discussed  how  these  realizations  from 
this  class  of  Gibbs  distributions  are  generated.  Basically 
there  are  two  well-known  methods  of  generating  realizations 
from  MRF's  or  Dibbs  distribution's.  They  are  the  "exchange" 
type  and  the  "spin-flip"  type  algorithms.  In  the  exchange 
type  algorithm,  also  known  as  the  Metropolis'  Algorithm  15), 
two  pixels  are  chosen  at  random.  Their  values  are  exchanged 
if  they  are  different  and  if  the  exchange  will  take  the 
system  to  a  more  probable  (lower  energy)  configuration.  If  the 
new  configuration  is  less  probable  then  the  exchange  will  or 
will  not  take  place  depending  on  the  comparison  of  the  ratio 
of  the  probabilities  of  the  new  and  the  old  configurations  with 
a  random  number  uniform  on  (0,1).  The  randomization  is  necessary 
to  ensure  that  the  system  does  not  get  stuck  in  a  local  high 
probability  configuration.  The  ratio  of  the  probabilities  of 
the  new  and  the  old  configurations  are  calculated  easily  due  to 
the  Gibbs  distribution  formulation,  without  actually  determining 
either  of  the  probabilities,  which  would  be  extremely  difficult. 

It  is  well  known  that  this  algorithm  will  converge  to  a 
configuration  that  maximizes  the  joint  probability  but  the  rate 
of  convergence  is  a  difficult  problem  of  statistical  physics 
and  a  completely  satisfactory  solution  to  this  problem  does 
not  exist.  As  would  be  expected,  the  initial  configuration  does 
not  influence  the  convergence  properties  of  the  algorithm. 

It  might  only  take  a  few  more  iterations  to  converge  for 
certain  initial  configurations  as  compared  to  others.  The 
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Metropolis'  Algorithm  has  been  used  by  Cross  and  Jain  [2]  in 
generating  textures  using  MRF  models.  In  a  recent  paper, 
Kirkpatrick,  Gelatt  and  Vecchi  [4]  proved  a  stochastic 
approximation  method  for  solving  combinatorial  optimization 
problems,  which  can  be  used  for  the  minimization  problem. 

A  major  drawback  of  the  Metropolis'  Algorithm  is  that  the 
number  of  pixels  in  each  gray  level  does  not  change  during 
iterations  of  the  algorithm,  due  to  the  fact  that  new 
configurations  are  generated  simply  by  exchanging  two  pixel 
values.  Therefore  we  choose  to  use  the  second  method  for 
generating  realizations  from  a  MRF  (or  Gibbs  distribution) . 

The  second  set  of  algorithms  for  generating  realizations 
from  a  MRF  (or  Gibbs  distribution)  are  known  as  "spin-flip" 
algorithms.  Recently  a  version  of  this  algorithm  is  presented 
by  Geman  and  Geman  [3]  in  an  image  processing  context.  This 
algorithm,  which  they  called  the  "Gibbs  Sampler",  works  as 
follows.  A  pixel  is  chosen  at  random  or  in  a  deterministic 
manner.  The  value  of  the  pixel  is  renewed  by  disregarding 
its  present  value,  noting  the  values  in  its  neighborhood 
and  replacing  the  pixel  value  by  a  random  number  generated 
according  to  the  conditional  distribution  specified  by  the 
MRF  (or  Gibbs  distribution) .  Pixel  visiting  mechanism  can 
be  random  or  deterministic,  such  as  raser  scan,  the  impotant 
point  being  that  each  pixel  should  be  visited  infinitely  often 
as  the  algorithm  proceeds  ad  infinitum.  This  algorithm  has 
the  characteristics  of  a  relaxation  algorithm  in  image 
processing;  therefore,  it  is  also  call  stochastic  relaxation. 

In  this  section  we  use  the  sampler  process  stated  in 
§2  to  find  the  value (s)  of  f  €  ft  with  lowest  energy. 


Let  us  indicate  the  dependence  of  n  on  T  by  writing  irT,  and  let 
T (t)  denote  the  temperature  at  stage  t. 

We  assume  that  is  Gibbs  distribution.  Let 

ftQ  *  {to  €  ft:  U (to)  =  min  U(n)) 

and  let  tTq  be  the  uniform  distribution  on  ftQ. 

Finally  define 

U*  =  max  U  (cd)  , 
w 

U*  =  min  U(w) , 

ID 

A  =  U*  -  U* 

The  sampler  process  {X(t);  t  =  0,1,2,***}  is  generated  by 
P(Xg(t)=xs,  s  £  S)  =  irT(tJ  (xn  |xg,  s4nt)P(X(t-l)=xs,  s4nt)  . 

We  obtain  the  following 

THEOREM  2.  Assume  that  there  exists  an  integer  x  > N  such 
that  for  every  t  =  0 , 1 , 2 , * • *  we  have 

s  C(nt+1,nt+2,---,nt+t). 

Let  T (t )  be  any  decreasing  sequence  of  temperatures  for  which 

a)  T(t)  +  0  as  t  +  » 

b)  T  (t)  >  NA/log  t  for  all  t  >  tg  for  some  integer  tQ>t2. 

Then  for  any  starting  configuration  n  €  ft  and  for  every  u>  ft, 

lim  P(X(t)=w  |  X  (0 )  =n)  -  tt0(u>). 
t  00 

OUTLINE  OF  PROOF.  We  replace  and  in  Theorem  1  for 
7TT(t)  and  Tr0  in  Theorem  2.  By  assumption,  we  obtain  the 
inequality 
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I 


m 


It  is  easy  to  show  the  convergence  of  to  7,o*  Because 


»T(t)(M)  = 


exP (" 


l  exp(-  S.jgyl)  +  l 


to  ’  €  £1 


to  *  •£ 


ovn  l  0(«')  » 

exp  (  fTtT") 


n0l+  I  exp(- 
0  u'6  n\n„  Tm 


converges  to  as  t->-<®. 


We  obtain  Theorem  2  as  the  result  corresponding  to  Theorem  1. 


Q.E.D. 


wxtxrhi 


References 


[1]  J.  Besag,  Spacial  interaction  and  the  statistical  analysis 
of  lattice  systems,  J.  Royal  Stat.  Soc. ,  B  34(1972), 

75-83. 

[2]  G.  C.  Cross  and  A.  K.  Jain,  Markov  random  field  texture 
models,  IEEE  Trans.  PAMI ,  5(1983),  25-39. 

[3]  S.  Geman  and  D.  Geman,  Stochastic  relaxation,  Gibbs 
distribution,  and  the  Bayesian  restoration  of  images, 

IEEE  Trans.  PAMI,  6(1984),  721-741. 

[4]  S.  Kirkpatrick,  C.  D.  Gelatt  and  M.  P.  Vecchi,  Optimization 
by  simulated  annealing.  Science,  220(1983),  671-680. 

[5]  N.  Metropolis  et.  al. ,  Equation  of  state  calculations  by 
fast  computing  mechanics,  J.  Phys.  Chem. ,  21(1953), 
1087-1092. 

[6]  F.  Spitzer,  Markov  random  fields  and  Gibbs  ensembles, 

Amer.  Math.  Monthly,  78(1971),  142-154. 


